AWS EKS 中服务权限管理完全指南
在 EKS 里跑的服务要访问 S3、DynamoDB、SQS 这些 AWS 资源,权限怎么给?这个问题看似简单,但方案有好几种,每种的安全边界、运维复杂度、适用场景都不一样。选错了轻则权限过大留下安全隐患,重则服务跑不起来。这篇文章把所有方案从粗到细全部讲透,每种都给出完整的 AWS 侧和 K8s 侧配置。
一、权限方案全景图
| 方案 | 粒度 | 安全性 | 复杂度 | 推荐度 |
|---|---|---|---|---|
| Node Group IAM Role | 节点级别(所有 Pod 共享) | 低 | 低 | ⭐⭐ |
| IRSA (IAM Roles for Service Accounts) | ServiceAccount 级别 | 高 | 中 | ⭐⭐⭐⭐⭐ |
| EKS Pod Identity | ServiceAccount 级别 | 高 | 低 | ⭐⭐⭐⭐⭐ |
| kube2iam / kiam | Pod 级别(annotation) | 中 | 高 | ⭐⭐(已过时) |
| Access Key 硬编码 | 容器级别 | 极低 | 低 | ⭐(禁止使用) |
先说结论:2024 年以后的新项目,首选 EKS Pod Identity;存量项目用 IRSA;Node Group Role 只用于节点自身需要的权限(如 ECR 拉镜像);永远不要用 Access Key。
二、方案一:Node Group IAM Role(节点级别权限)
2.1 原理
每个 Node Group 关联一个 IAM Role,该节点上的所有 Pod 都可以通过 EC2 Instance Metadata Service (IMDS) 获取这个 Role 的临时凭证。这是最简单也是最粗暴的方式。
| EC2 Node (Node Group) IAM Role: eks-node-role |
|
| Pod A (需要S3) |
Pod B (需要SQS) |
| ⚠ 两个 Pod 都能访问 S3 和 SQS(权限过大) | |
2.2 Terraform 配置 — IAM Role
# Node Group 的 IAM Role
resource "aws_iam_role" "eks_node" {
name = "${var.cluster_name}-node-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
# EKS 节点必需的基础策略(这些是必须的)
resource "aws_iam_role_policy_attachment" "node_AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.eks_node.name
}
resource "aws_iam_role_policy_attachment" "node_AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.eks_node.name
}
resource "aws_iam_role_policy_attachment" "node_AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks_node.name
}
# 如果用 Node Group Role 给业务权限(不推荐,但有时不得不用)
resource "aws_iam_role_policy" "node_s3_access" {
name = "s3-access"
role = aws_iam_role.eks_node.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
]
Resource = [
"arn:aws:s3:::my-app-bucket",
"arn:aws:s3:::my-app-bucket/*"
]
}]
})
}
# Node Group
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "main"
node_role_arn = aws_iam_role.eks_node.arn
subnet_ids = var.private_subnet_ids
scaling_config {
desired_size = 3
max_size = 10
min_size = 1
}
instance_types = ["m5.large"]
}
2.3 Deployment 配置
使用 Node Group Role 时,Deployment 不需要任何特殊配置,Pod 自动继承节点的 IAM Role:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/my-app:latest
# 不需要任何 IAM 相关配置
# Pod 自动通过 IMDS 获取节点的 IAM Role 凭证
env:
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
2.4 问题与风险
- 权限爆炸:节点上所有 Pod 共享同一个 Role。如果 Pod A 需要 S3,Pod B 需要 DynamoDB,Node Role 就得同时有 S3 和 DynamoDB 权限,Pod A 也能访问 DynamoDB
- 横向移动风险:攻击者拿下任意一个 Pod,就能获取节点上所有 AWS 权限
- IMDS 攻击:Pod 可以直接访问 169.254.169.254 获取凭证。虽然 IMDSv2 缓解了部分风险,但根本问题没解决
什么时候用 Node Group Role?只用于节点自身运行所需的权限:拉取 ECR 镜像、CNI 网络插件、CloudWatch 日志。业务权限不要放在这里。
2.5 限制 IMDS 访问(安全加固)
如果你用 IRSA 或 Pod Identity 给 Pod 赋权,应该阻止 Pod 访问 IMDS,防止它们"偷"节点的 Role:
# Node Group 启动模板:强制 IMDSv2 + 限制 hop
resource "aws_launch_template" "eks_node" {
name_prefix = "${var.cluster_name}-node-"
metadata_options {
http_endpoint = "enabled"
http_tokens = "required" # 强制 IMDSv2
http_put_response_hop_limit = 1 # 关键:设为 1,容器内无法访问 IMDS
}
# ... 其他配置
}
hop_limit = 1 意味着 IMDS 请求只能从 EC2 实例本身发起,容器内的请求(多了一跳)会被拒绝。这是使用 IRSA/Pod Identity 时的安全最佳实践。
三、方案二:IRSA — IAM Roles for Service Accounts(推荐)
3.1 原理
IRSA 是 AWS 官方推荐的 Pod 级别权限方案。核心思路:
- EKS 集群有一个 OIDC Provider
- 创建 IAM Role,信任策略指定只有特定 namespace 的特定 ServiceAccount 可以 AssumeRole
- K8s ServiceAccount 上加 annotation 指向这个 IAM Role
- Pod 使用这个 ServiceAccount 后,AWS SDK 自动通过 STS 获取对应 Role 的临时凭证
| EC2 Node | |
| Pod A SA: s3-reader Role: S3ReadRole → 只能读 S3 |
Pod B SA: sqs-writer Role: SQSRole → 只能写 SQS |
| ✅ 每个 Pod 有独立的最小权限 | |
3.2 第一步:创建 OIDC Provider
# 获取 EKS 集群的 OIDC 信息
data "aws_eks_cluster" "main" {
name = var.cluster_name
}
data "tls_certificate" "eks" {
url = data.aws_eks_cluster.main.identity[0].oidc[0].issuer
}
# 创建 OIDC Provider(每个集群只需要创建一次)
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = data.aws_eks_cluster.main.identity[0].oidc[0].issuer
tags = {
Cluster = var.cluster_name
}
}
3.3 第二步:创建 IAM Role(带 OIDC 信任策略)
locals {
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = replace(
data.aws_eks_cluster.main.identity[0].oidc[0].issuer,
"https://", ""
)
}
# 给 order-service 创建专用 IAM Role
resource "aws_iam_role" "order_service" {
name = "${var.cluster_name}-order-service"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = local.oidc_provider_arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
# 限制:只有 production namespace 的 order-service SA 可以 assume
"${local.oidc_provider_url}:sub" = "system:serviceaccount:production:order-service"
"${local.oidc_provider_url}:aud" = "sts.amazonaws.com"
}
}
}]
})
}
# 赋予具体权限:只能访问订单相关的 DynamoDB 表和 SQS 队列
resource "aws_iam_role_policy" "order_service" {
name = "order-service-policy"
role = aws_iam_role.order_service.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query"
]
Resource = [
"arn:aws:dynamodb:ap-southeast-1:123456789:table/orders",
"arn:aws:dynamodb:ap-southeast-1:123456789:table/orders/index/*"
]
},
{
Effect = "Allow"
Action = [
"sqs:SendMessage",
"sqs:ReceiveMessage",
"sqs:DeleteMessage"
]
Resource = "arn:aws:sqs:ap-southeast-1:123456789:order-events"
}
]
})
}
3.4 第三步:创建 K8s ServiceAccount
# service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: order-service
namespace: production
annotations:
# 关键:这个 annotation 把 SA 和 IAM Role 关联起来
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/my-cluster-order-service
labels:
app: order-service
3.5 第四步:Deployment 使用 ServiceAccount
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
serviceAccountName: order-service # 关键:指定 ServiceAccount
containers:
- name: order-service
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/order-service:v1.2.0
ports:
- containerPort: 8080
env:
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
# 不需要设置 AWS_ACCESS_KEY_ID 或 AWS_SECRET_ACCESS_KEY
# IRSA 会自动注入以下环境变量:
# AWS_ROLE_ARN
# AWS_WEB_IDENTITY_TOKEN_FILE
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
IRSA 的工作机制:EKS 的 Mutating Webhook 会自动给 Pod 注入一个 projected volume(包含 JWT token)和两个环境变量。AWS SDK 检测到这些环境变量后,会自动调用 STS AssumeRoleWithWebIdentity 获取临时凭证。整个过程对应用代码完全透明。
3.6 验证 IRSA 是否生效
# 进入 Pod 检查环境变量
kubectl exec -it deploy/order-service -n production -- env | grep AWS
# 应该看到:
# AWS_ROLE_ARN=arn:aws:iam::123456789:role/my-cluster-order-service
# AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
# AWS_DEFAULT_REGION=ap-southeast-1
# 检查 Pod 的实际身份
kubectl exec -it deploy/order-service -n production -- \
aws sts get-caller-identity
# 应该返回 IRSA 的 Role,而不是 Node Role:
# {
# "UserId": "AROA...:botocore-session-...",
# "Account": "123456789",
# "Arn": "arn:aws:sts::123456789:assumed-role/my-cluster-order-service/..."
# }
# 检查 projected volume 是否挂载
kubectl get pod -n production -l app=order-service -o yaml | grep -A5 "projected"
3.7 IRSA 的 Terraform 模块化
当服务多了,每个都写一遍很繁琐。封装成模块:
# modules/irsa/main.tf
variable "cluster_name" {}
variable "oidc_provider_arn" {}
variable "oidc_provider_url" {}
variable "namespace" {}
variable "service_account_name" {}
variable "policy_json" {}
resource "aws_iam_role" "this" {
name = "${var.cluster_name}-${var.namespace}-${var.service_account_name}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Federated = var.oidc_provider_arn }
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${var.oidc_provider_url}:sub" = "system:serviceaccount:${var.namespace}:${var.service_account_name}"
"${var.oidc_provider_url}:aud" = "sts.amazonaws.com"
}
}
}]
})
}
resource "aws_iam_role_policy" "this" {
name = "${var.service_account_name}-policy"
role = aws_iam_role.this.id
policy = var.policy_json
}
output "role_arn" {
value = aws_iam_role.this.arn
}
# ============================================
# 调用模块 — 一行搞定一个服务的权限
# ============================================
module "irsa_order_service" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "production"
service_account_name = "order-service"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["dynamodb:*"]
Resource = "arn:aws:dynamodb:*:*:table/orders*"
}]
})
}
module "irsa_payment_service" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "production"
service_account_name = "payment-service"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["sqs:*"]
Resource = "arn:aws:sqs:*:*:payment-*"
},
{
Effect = "Allow"
Action = ["kms:Decrypt", "kms:GenerateDataKey"]
Resource = "arn:aws:kms:ap-southeast-1:123456789:key/xxx"
}
]
})
}
四、方案三:EKS Pod Identity(最新推荐)
4.1 原理
EKS Pod Identity 是 2023 年底推出的新方案,目标是简化 IRSA 的配置。核心区别:
- 不需要 OIDC Provider:不用创建和管理 OIDC Provider
- 信任策略更简单:不需要在 IAM Role 的信任策略里写 OIDC URL
- 通过 EKS API 关联:用 aws_eks_pod_identity_association 资源把 Role 和 ServiceAccount 关联
4.2 第一步:安装 Pod Identity Agent
# EKS Pod Identity Agent Add-on
resource "aws_eks_addon" "pod_identity_agent" {
cluster_name = aws_eks_cluster.main.name
addon_name = "eks-pod-identity-agent"
# 确保 addon 版本兼容
resolve_conflicts_on_update = "OVERWRITE"
}
4.3 第二步:创建 IAM Role
# 注意:信任策略比 IRSA 简单很多
resource "aws_iam_role" "notification_service" {
name = "${var.cluster_name}-notification-service"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Service = "pods.eks.amazonaws.com" # 固定值,不需要 OIDC URL
}
Action = [
"sts:AssumeRole",
"sts:TagSession" # Pod Identity 需要这个
]
}]
})
}
# 权限策略
resource "aws_iam_role_policy" "notification_service" {
name = "notification-policy"
role = aws_iam_role.notification_service.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"sns:Publish"
]
Resource = "arn:aws:sns:ap-southeast-1:123456789:user-notifications"
},
{
Effect = "Allow"
Action = [
"ses:SendEmail",
"ses:SendRawEmail"
]
Resource = "*"
Condition = {
StringEquals = {
"ses:FromAddress" = "[email]"
}
}
}
]
})
}
4.4 第三步:创建 Pod Identity Association
# 这一步替代了 IRSA 中 ServiceAccount 上的 annotation
resource "aws_eks_pod_identity_association" "notification_service" {
cluster_name = aws_eks_cluster.main.name
namespace = "production"
service_account = "notification-service"
role_arn = aws_iam_role.notification_service.arn
}
4.5 K8s 侧配置
# service-account.yaml — 注意:不需要 annotation
apiVersion: v1
kind: ServiceAccount
metadata:
name: notification-service
namespace: production
# 不需要 eks.amazonaws.com/role-arn annotation
# Pod Identity 通过 AWS API 关联,不依赖 K8s annotation
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: notification-service
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: notification-service
template:
metadata:
labels:
app: notification-service
spec:
serviceAccountName: notification-service
containers:
- name: notification-service
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/notification:v2.0
ports:
- containerPort: 8080
env:
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
4.6 IRSA vs Pod Identity 对比
| 维度 | IRSA | Pod Identity |
|---|---|---|
| OIDC Provider | 需要创建和管理 | 不需要 |
| IAM 信任策略 | 包含 OIDC URL(集群特定) | 固定 pods.eks.amazonaws.com |
| 关联方式 | ServiceAccount annotation | aws_eks_pod_identity_association |
| 跨集群复用 Role | 不行(信任策略绑定集群 OIDC) | 可以(信任策略不含集群信息) |
| Fargate 支持 | 支持 | 不支持(截至 2025 年初) |
| 最低 EKS 版本 | 1.13+ | 1.24+ |
| SDK 要求 | 较新版本 | 最新版本 |
五、实战场景:一个集群多个服务的完整配置
假设你的 EKS 集群里跑了 4 个服务,每个需要不同的 AWS 权限:
| 服务 | Namespace | 需要的 AWS 权限 |
|---|---|---|
| order-service | production | DynamoDB (orders 表) + SQS (order-events) |
| payment-service | production | SQS (payment-queue) + KMS (解密) |
| image-processor | production | S3 (上传/下载图片) + Rekognition |
| log-shipper | kube-system | CloudWatch Logs + Kinesis Firehose |
5.1 完整 Terraform 配置
# ============================================
# 所有 IRSA Role 定义
# ============================================
# 1. order-service
module "irsa_order" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "production"
service_account_name = "order-service"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "DynamoDBAccess"
Effect = "Allow"
Action = ["dynamodb:GetItem", "dynamodb:PutItem",
"dynamodb:UpdateItem", "dynamodb:Query",
"dynamodb:BatchGetItem"]
Resource = [
"arn:aws:dynamodb:${var.region}:${data.aws_caller_identity.current.account_id}:table/orders",
"arn:aws:dynamodb:${var.region}:${data.aws_caller_identity.current.account_id}:table/orders/index/*"
]
},
{
Sid = "SQSAccess"
Effect = "Allow"
Action = ["sqs:SendMessage"]
Resource = "arn:aws:sqs:${var.region}:${data.aws_caller_identity.current.account_id}:order-events"
}
]
})
}
# 2. payment-service
module "irsa_payment" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "production"
service_account_name = "payment-service"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "SQSAccess"
Effect = "Allow"
Action = ["sqs:ReceiveMessage", "sqs:DeleteMessage",
"sqs:GetQueueAttributes"]
Resource = "arn:aws:sqs:${var.region}:${data.aws_caller_identity.current.account_id}:payment-*"
},
{
Sid = "KMSDecrypt"
Effect = "Allow"
Action = ["kms:Decrypt"]
Resource = aws_kms_key.payment.arn
}
]
})
}
# 3. image-processor
module "irsa_image" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "production"
service_account_name = "image-processor"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "S3Access"
Effect = "Allow"
Action = ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"]
Resource = "arn:aws:s3:::${var.image_bucket}/*"
},
{
Sid = "S3ListBucket"
Effect = "Allow"
Action = ["s3:ListBucket"]
Resource = "arn:aws:s3:::${var.image_bucket}"
},
{
Sid = "RekognitionAccess"
Effect = "Allow"
Action = ["rekognition:DetectLabels", "rekognition:DetectFaces",
"rekognition:DetectModerationLabels"]
Resource = "*"
}
]
})
}
# 4. log-shipper (kube-system namespace)
module "irsa_log_shipper" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "kube-system"
service_account_name = "log-shipper"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
]
Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:/eks/${var.cluster_name}/*"
},
{
Effect = "Allow"
Action = ["firehose:PutRecord", "firehose:PutRecordBatch"]
Resource = "arn:aws:firehose:${var.region}:${data.aws_caller_identity.current.account_id}:deliverystream/eks-logs"
}
]
})
}
5.2 完整 K8s YAML
# 所有 ServiceAccount
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: order-service
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/my-cluster-production-order-service
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: payment-service
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/my-cluster-production-payment-service
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: image-processor
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/my-cluster-production-image-processor
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: log-shipper
namespace: kube-system
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/my-cluster-kube-system-log-shipper
# order-service Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
serviceAccountName: order-service
# 安全加固:不自动挂载默认 SA 的 token
automountServiceAccountToken: true
containers:
- name: order-service
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/order-service:v1.2.0
ports:
- containerPort: 8080
env:
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
- name: DYNAMODB_TABLE
value: "orders"
- name: SQS_QUEUE_URL
value: "https://sqs.ap-southeast-1.amazonaws.com/123456789/order-events"
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: "1"
memory: 512Mi
---
# image-processor Deployment(需要更多资源)
apiVersion: apps/v1
kind: Deployment
metadata:
name: image-processor
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: image-processor
template:
metadata:
labels:
app: image-processor
spec:
serviceAccountName: image-processor
containers:
- name: image-processor
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/image-processor:v3.1
env:
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
- name: S3_BUCKET
value: "my-app-images"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
六、特殊场景处理
6.1 CronJob 使用 IRSA
# CronJob 也可以用 IRSA,配置方式一样
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-report
namespace: production
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
jobTemplate:
spec:
template:
spec:
serviceAccountName: report-generator # 使用专用 SA
containers:
- name: report
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/report:v1
command: ["python", "generate_report.py"]
env:
- name: AWS_DEFAULT_REGION
value: "ap-southeast-1"
restartPolicy: OnFailure
6.2 一个 ServiceAccount 多个 Deployment 共享
如果多个 Deployment 需要相同的 AWS 权限,可以共享同一个 ServiceAccount:
# 共享 SA:api-gateway 和 api-worker 都需要访问同一个 S3 bucket
apiVersion: v1
kind: ServiceAccount
metadata:
name: api-shared
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/my-cluster-api-shared
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: production
spec:
template:
spec:
serviceAccountName: api-shared # 共享
containers:
- name: api-gateway
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/api-gateway:v1
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-worker
namespace: production
spec:
template:
spec:
serviceAccountName: api-shared # 共享同一个 SA
containers:
- name: api-worker
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/api-worker:v1
6.3 跨账号访问
服务在账号 A 的 EKS 里运行,但需要访问账号 B 的 S3:
# 账号 A:IRSA Role 有权限 assume 账号 B 的 Role
module "irsa_cross_account" {
source = "./modules/irsa"
cluster_name = var.cluster_name
oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
oidc_provider_url = local.oidc_provider_url
namespace = "production"
service_account_name = "data-sync"
policy_json = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = "sts:AssumeRole"
Resource = "arn:aws:iam::999888777:role/cross-account-s3-access"
}]
})
}
# 账号 B:创建被 assume 的 Role
resource "aws_iam_role" "cross_account_s3" {
provider = aws.account_b
name = "cross-account-s3-access"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::123456789:role/my-cluster-production-data-sync"
}
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy" "cross_account_s3" {
provider = aws.account_b
role = aws_iam_role.cross_account_s3.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:ListBucket"]
Resource = ["arn:aws:s3:::account-b-data", "arn:aws:s3:::account-b-data/*"]
}]
})
}
# 应用代码:先 assume 账号 B 的 Role,再访问 S3
import boto3
# IRSA 自动提供账号 A 的凭证
sts = boto3.client('sts')
# Assume 账号 B 的 Role
assumed = sts.assume_role(
RoleArn='arn:aws:iam::999888777:role/cross-account-s3-access',
RoleSessionName='data-sync'
)
# 用账号 B 的临时凭证访问 S3
s3 = boto3.client('s3',
aws_access_key_id=assumed['Credentials']['AccessKeyId'],
aws_secret_access_key=assumed['Credentials']['SecretAccessKey'],
aws_session_token=assumed['Credentials']['SessionToken']
)
data = s3.get_object(Bucket='account-b-data', Key='export/latest.csv')
6.4 Init Container 使用不同权限
# 场景:init container 需要从 Secrets Manager 拉取密钥
# 主容器需要访问 DynamoDB
# 两者共享同一个 SA,所以 Role 要包含两种权限
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: production
spec:
template:
spec:
serviceAccountName: secure-app # Role 包含 SecretsManager + DynamoDB 权限
initContainers:
- name: secret-fetcher
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/secret-fetcher:v1
command: ["sh", "-c"]
args:
- |
aws secretsmanager get-secret-value \
--secret-id prod/db-credentials \
--query SecretString --output text > /secrets/db-creds.json
volumeMounts:
- name: secrets
mountPath: /secrets
containers:
- name: app
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/secure-app:v2
volumeMounts:
- name: secrets
mountPath: /secrets
readOnly: true
volumes:
- name: secrets
emptyDir:
medium: Memory # 内存存储,不落盘
七、安全最佳实践
7.1 最小权限原则
// 错误:给了整个 DynamoDB 的权限
{
"Effect": "Allow",
"Action": "dynamodb:*",
"Resource": "*"
}
// 正确:只给需要的操作和具体的表
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:Query"
],
"Resource": [
"arn:aws:dynamodb:ap-southeast-1:123456789:table/orders",
"arn:aws:dynamodb:ap-southeast-1:123456789:table/orders/index/user-id-index"
]
}
7.2 条件限制
// 限制只能从特定 VPC 访问
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::sensitive-data/*",
"Condition": {
"StringEquals": {
"aws:SourceVpc": "vpc-0123456789abcdef0"
}
}
}
// 限制只能访问特定前缀的 S3 对象
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-bucket/tenant-a/*",
"Condition": {
"StringLike": {
"s3:prefix": ["tenant-a/*"]
}
}
}
7.3 Pod Security 加固
# 配合 IRSA 使用的安全加固 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-service
namespace: production
spec:
template:
spec:
serviceAccountName: secure-service
# 不使用 host 网络
hostNetwork: false
# 不使用 host PID
hostPID: false
securityContext:
# Pod 级别安全上下文
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: my-app:v1
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
# 只读文件系统需要 tmp 目录
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
7.4 审计与监控
# CloudTrail 监控 AssumeRoleWithWebIdentity 调用
resource "aws_cloudwatch_log_metric_filter" "irsa_assume_role" {
name = "irsa-assume-role-failures"
pattern = "{ ($.eventName = \"AssumeRoleWithWebIdentity\") && ($.errorCode = \"*\") }"
log_group_name = aws_cloudwatch_log_group.cloudtrail.name
metric_transformation {
name = "IRSAAssumeRoleFailures"
namespace = "Custom/EKS"
value = "1"
}
}
resource "aws_cloudwatch_metric_alarm" "irsa_failures" {
alarm_name = "eks-irsa-assume-role-failures"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "IRSAAssumeRoleFailures"
namespace = "Custom/EKS"
period = 300
statistic = "Sum"
threshold = 5
alarm_description = "IRSA AssumeRole 失败次数异常,可能有未授权访问尝试"
alarm_actions = [aws_sns_topic.alerts.arn]
}
八、排障指南
8.1 常见问题排查流程
# 问题:Pod 报 AccessDenied 或 NoCredentialProviders
# 1. 确认 ServiceAccount 是否正确关联
kubectl get sa order-service -n production -o yaml
# 检查 annotations 里有没有 eks.amazonaws.com/role-arn
# 2. 确认 Pod 是否使用了正确的 ServiceAccount
kubectl get pod -n production -l app=order-service -o jsonpath='{.items[0].spec.serviceAccountName}'
# 3. 确认 IRSA 环境变量是否注入
kubectl exec -it deploy/order-service -n production -- env | grep AWS_ROLE_ARN
kubectl exec -it deploy/order-service -n production -- env | grep AWS_WEB_IDENTITY_TOKEN_FILE
# 4. 确认 token 文件是否存在
kubectl exec -it deploy/order-service -n production -- \
cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token | head -c 50
# 5. 确认 Pod 的实际身份
kubectl exec -it deploy/order-service -n production -- \
aws sts get-caller-identity
# 6. 如果身份是 Node Role 而不是 IRSA Role:
# - 检查 EKS 的 Mutating Webhook 是否正常
kubectl get mutatingwebhookconfigurations | grep eks
# - 检查 OIDC Provider 是否正确
aws eks describe-cluster --name my-cluster --query "cluster.identity.oidc"
# - 检查 IAM Role 的信任策略中 OIDC URL 是否匹配
# 7. 手动测试 AssumeRoleWithWebIdentity
TOKEN=$(kubectl exec deploy/order-service -n production -- \
cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token)
aws sts assume-role-with-web-identity \
--role-arn arn:aws:iam::123456789:role/my-cluster-production-order-service \
--role-session-name test \
--web-identity-token "$TOKEN"
8.2 常见错误及解决
| 错误信息 | 原因 | 解决方案 |
|---|---|---|
| An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity | IAM Role 信任策略中的 OIDC URL 或 SA 名称不匹配 | 检查 Role 信任策略中的 sub 条件是否为 system:serviceaccount:namespace:sa-name |
| NoCredentialProviders | IRSA 环境变量未注入 | 检查 SA annotation、EKS webhook、Pod 是否指定了 serviceAccountName |
| InvalidIdentityToken | OIDC Provider 的 thumbprint 过期或不匹配 | 更新 OIDC Provider 的 thumbprint |
| ExpiredTokenException | Token 过期(默认 24 小时) | 确保 SDK 版本支持自动刷新 token,升级 AWS SDK |
| Pod 使用了 Node Role 而非 IRSA Role | Webhook 未注入或 SA 配置错误 | kubectl describe pod 检查 volumes 中是否有 aws-iam-token |
九、总结与选型决策树
| 场景 | 推荐方案 |
|---|---|
| 节点自身需要的权限(ECR、CNI、日志) | Node Group IAM Role |
| 业务 Pod 权限,EKS >= 1.24 且不用 Fargate | EKS Pod Identity(最简单) |
| 业务 Pod 权限,EKS 版本较老 或 使用 Fargate | IRSA(最成熟) |
| 需要跨账号访问 | IRSA / Pod Identity + AssumeRole 链 |
| 有人提议用 Access Key | 拒绝,没有例外 |
核心原则:每个服务一个 ServiceAccount,每个 ServiceAccount 一个最小权限的 IAM Role。这是 EKS 安全的基石。配置虽然比直接给 Node Group 加权限麻烦一些,但在安全审计、故障排查、权限回收时会感谢自己当初的选择。
留言板
留言提交后需管理员审核通过才会显示